In order to determine how to properly augment and fit models with our data, the fist crucial step is to identify time series components such as trends and seasonality. This will allow us to fit models that account for all components present in the data, separating these advanced models from more naïve methods. Thus, this first section will re-iterate some visualizations seen on the previous page for the purpose of identifying components simply by observing time series data.
Note: Much of the code and process on this page is re-purposed from:
Gamage, P. (2026). Applied Time Series for Data Science.
fig <- ridership %>%plot_ly(x =~`Quarter-Year`,y =~`Total Ridership (000s)`,type ='scatter',mode ='lines',showlegend = F )%>%layout(title =list(text="Quarterly Total Public Transit Ridership"),xaxis =list(title="Date"),yaxis =list(title="Public Transit Trips (000s)"))fig
This plot suggests that there is an upward trend that is relatively consistent until 2020, followed by a steep downturn. There is also likely seasonality, with Q1 generally having lower ridership and Q3 having higher.
Code
fig <- mta %>%plot_ly(x =~Date,y =~`total_ridership`,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~weekly_ma, name ="7-day Moving Average", line =list(color ='red')) %>%layout(title =list(text="MTA: Daily Total Estimated Ridership"),xaxis =list(title="Date"),yaxis =list(title="Total Daily Public Transit Trips"))fig
This plot likely has a logarithmic trend, with definite seasonality that favors spring/fall months and dips during winter months.
Code
fig <- cta %>%plot_ly(x =~Date,y =~`total_rides`,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~weekly_ma, name ="30-day Moving Average", line =list(color ='red')) %>%layout(title =list(text="CTA: Daily Total Estimated Ridership"),xaxis =list(title="Date"),yaxis =list(title="Total Daily Public Transit Trips"))fig
This data has similar elements to the other ridership data. We have clear seasonality that results in low winter ridership, a slow decline prior to 2020, and the familiar drop in 2020 and subsequent climb back up.
Code
cars$observation_date <-as.Date(cars$observation_date, format ="%m/%d/%y")cars <- cars[cars$observation_date >=as.Date("1993-04-01"),]cars <- cars[cars$observation_date <=as.Date("2024-01-01"),]fig <- cars %>%plot_ly(x =~observation_date,y =~M12MTVUSM227NFWA,type ='scatter',mode ='lines',showlegend = F )%>%layout(title =list(text="Monthly Vehicle Miles Driven"),xaxis =list(title="Year"),yaxis =list(title="Vehicle Miles (Millions)"))fig
This data does not seem to have much seasonality or periodicity, just a simple trend that is interrupted by historic events in 2008 and 2020.
Code
gas$Month <-as.Date(paste0(gas$Month, "-01"), format ="%b-%y-%d")gas <- gas[gas$Month >=as.Date("1993-04-01"),]gas <- gas[gas$Month <=as.Date("2024-01-01"),]gas <- gas[order(gas$Month), ]fig <- gas %>%plot_ly(x =~Month,y =~`Price per Gallon`,type ='scatter',mode ='lines',showlegend = F )%>%layout(title =list(text="Monthly Average Gas Prices"),xaxis =list(title="Year"),yaxis =list(title="Average Price (Dollars)"))fig
This plot shows a clear trend upward, but it is far from consistent. Additionally, there is no obvious seasonality here and the variation seems to largely be noise-based and responsive to historic events that change the economic landscape
The seasonality here may be present but is not obvious. What we do see is spikes seem to be rapid, followed by gradual slides back to lower rates. It is most common for unemployment rates to be falling.
Code
plot <-plot_ly(rideshare, x =~Date) %>%add_lines(y =~Uber, name ="Uber Closing Price", line =list(color ='black')) %>%add_lines(y =~Lyft, name ="Lyft Closing Price", line =list(color ='#cc00bb')) %>%layout(title ="Rideshare Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
Code
plot <-plot_ly(telecom, x =~Date) %>%add_lines(y =~Zoom, name ="Zoom Closing Price", line =list(color ='blue')) %>%add_lines(y =~Microsoft, name ="Microsoft Closing Price", line =list(color ='#008800')) %>%add_lines(y =~Cisco, name ="Cisco Closing Price", line =list(color ='#bb0000')) %>%layout(title ="Telecommunications Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
Code
plot <-plot_ly(oil, x =~Date) %>%add_lines(y =~BP, name ="BP Closing Price", line =list(color ='#66cc00')) %>%add_lines(y =~Shell, name ="Shell Closing Price", line =list(color ='#cccc00')) %>%add_lines(y =~Exxon, name ="Exxon Closing Price", line =list(color ='#bb0000')) %>%add_lines(y =~Chevron, name ="Chevron Closing Price", line =list(color ='#0000bb')) %>%layout(title ="Oil Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
There are no obvious time series components present in these financial plots, nor should we expect any for such unpredictable data. To simplify further analysis, we will select one company’s stock for each industry to serve as a representative proxy for how that industry has reacted to economic phenomena over the past five years:
ridership_ts <-ts(ridership$`Total Ridership (000s)`, start=c(1992,1), frequency =4)gglagplot(ridership_ts, do.lines=FALSE) +xlab("Lags")+ylab("Total Ridership (000s)")+ggtitle("Lag Plot for Public Transit Ridership") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Code
mta_ts <-ts(mta$total_ridership, start=c(2020,3,1), frequency =365.25)gglagplot(mta_ts, do.lines=FALSE) +xlab("Lags")+ylab("Total Ridership")+ggtitle("Lag Plot for MTA (New York City) Public Transit Ridership") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Code
cta_ts <-ts(cta$total_rides, start=c(2015,1,1), frequency =365.25)gglagplot(mta_ts, do.lines=FALSE) +xlab("Lags")+ylab("Total Ridership")+ggtitle("Lag Plot for CTA (Chicago) Public Transit Ridership") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Code
cars_ts <-ts(cars$M12MTVUSM227NFWA, start=c(1993,4), frequency =12)gglagplot(cars_ts, do.lines=FALSE) +xlab("Lags")+ylab("Million Vehicle Miles")+ggtitle("Lag Plot for Vehicle Miles") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Code
gas_ts <-ts(gas$`Price per Gallon`, start=c(1993,4), frequency =12)gglagplot(gas_ts, do.lines=FALSE) +xlab("Lags")+ylab("Average Gas Price")+ggtitle("Lag Plot for Gas Prices") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Code
unemp_ts <-ts(unemp$UNRATE, start=c(1993,4), frequency =12)gglagplot(unemp_ts, do.lines=FALSE) +xlab("Lags")+ylab("Unemployment Rate")+ggtitle("Lag Plot for Unemployment Rate") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Uber:
Code
uber_ts <-ts(uber$UBER.Adjusted, start=c(2020,1,1), frequency =365.25)gglagplot(uber_ts, do.lines=FALSE) +xlab("Lags")+ylab("Adjusted Stock Price")+ggtitle("Lag Plot for Uber Stock") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Zoom:
Code
zoom_ts <-ts(zoom$ZM.Adjusted, start=c(2020,1,1), frequency =365.25)gglagplot(zoom_ts, do.lines=FALSE) +xlab("Lags")+ylab("Adjusted Stock Price")+ggtitle("Lag Plot for Zoom Stock") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
Exxon:
Code
exxon_ts <-ts(exxon$XOM.Adjusted, start=c(2020,1,1), frequency =365.25)gglagplot(exxon_ts, do.lines=FALSE) +xlab("Lags")+ylab("Adjusted Stock Price")+ggtitle("Lag Plot for Exxon Stock") +theme(axis.text.x=element_text(angle=45, hjust=1)) +theme_bw()
All of these lag plots show that there is significant autocorrelation, even at large lag values, for the majority of our data. This means that differencing will likely be required, which we will explore deeper in this section.
Decomposing the data and displaying it in faceted charts confirms many suspicions from the visualization portion. All ridership data has a large amount of seasonality, but the trends do differ from city to city, as well as based on the time frame we are looking at. What is also interesting is the fact that there is noticeable seasonality in vehicle miles, gas prices, and unemployment rate, meaning this component will need to be considered when fitting univariate time series models. Meanwhile, the financial time series each appear to have a yearly seasonal component, which was not obvious from just looking at the raw time series plots.
library(gridExtra)ridership_acf <-ggAcf(ridership_ts)+ggtitle("ACF Plot for Public Transit Ridership") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") ridership_pacf <-ggPacf(ridership_ts)+ggtitle("PACF Plot for Public Transit Ridership") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(ridership_acf, ridership_pacf, nrow=2)
Code
mta_acf <-ggAcf(mta_ts, lag.max =30)+ggtitle("ACF Plot for MTA (New York City) Ridership") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") mta_pacf <-ggPacf(mta_ts)+ggtitle("PACF Plot for MTA (New York City) Ridership") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(mta_acf, mta_pacf, nrow=2)
Code
cta_acf <-ggAcf(cta_ts, lag.max =30)+ggtitle("ACF Plot for CTA (Chicago) Ridership") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") cta_pacf <-ggPacf(cta_ts)+ggtitle("PACF Plot for CTA (Chicago) Ridership") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(cta_acf, cta_pacf, nrow=2)
Code
cars_acf <-ggAcf(cars_ts)+ggtitle("ACF Plot for Vehicle Miles") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") cars_pacf <-ggPacf(cars_ts)+ggtitle("PACF Plot for Vehicle Miles") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(cars_acf, cars_pacf, nrow=2)
Code
gas_acf <-ggAcf(gas_ts)+ggtitle("ACF Plot for Gas Prices") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") gas_pacf <-ggPacf(gas_ts)+ggtitle("PACF Plot for Gas Prices") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(gas_acf, gas_pacf, nrow=2)
Code
unemp_acf <-ggAcf(unemp_ts)+ggtitle("ACF Plot for Unemployment Rate") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") unemp_pacf <-ggPacf(unemp_ts)+ggtitle("PACF Plot for Unemployment Rate") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(unemp_acf, unemp_pacf, nrow=2)
Uber:
Code
uber_acf <-ggAcf(uber_ts, lag.max =30)+ggtitle("ACF Plot for Uber Stock") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") uber_pacf <-ggPacf(uber_ts, lag.max =30)+ggtitle("PACF Plot for Uber Stock") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(uber_acf, uber_pacf, nrow=2)
Zoom:
Code
zoom_acf <-ggAcf(zoom_ts, lag.max =30)+ggtitle("ACF Plot for Zoom Stock") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") zoom_pacf <-ggPacf(zoom_ts, lag.max =30)+ggtitle("PACF Plot for Zoom Stock") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(zoom_acf, zoom_pacf, nrow=2)
Exxon:
Code
exxon_acf <-ggAcf(exxon_ts, lag.max =30)+ggtitle("ACF Plot for Exxon Stock") +theme_bw() +geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") exxon_pacf <-ggPacf(exxon_ts, lag.max =30)+ggtitle("PACF Plot for Exxon Stock") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(exxon_acf, exxon_pacf, nrow=2)
Once again, these ACF and PACF plots confirm the necessity to use data augmentation to make up for non-stationarity. In all cases, there are large ACF values at virtually every lag, indicating significant autocorrelation.
Augmented Dickey-Fuller Tests
While we’re still working with raw data, Augmented Dickey-Fuller (ADF) tests can provide additional insight on the stationarity of our data by providing statistics that evaluate the null hypothesis of non-stationarity against the alternative hypothesis of stationarity.
Augmented Dickey-Fuller Test
data: na.remove(ridership_ts)
Dickey-Fuller = -1.8774, Lag order = 5, p-value = 0.6275
alternative hypothesis: stationary
Code
tseries::adf.test(na.remove(mta_ts))
Augmented Dickey-Fuller Test
data: na.remove(mta_ts)
Dickey-Fuller = -6.554, Lag order = 12, p-value = 0.01
alternative hypothesis: stationary
Code
tseries::adf.test(na.remove(cta_ts))
Augmented Dickey-Fuller Test
data: na.remove(cta_ts)
Dickey-Fuller = -3.8003, Lag order = 15, p-value = 0.01895
alternative hypothesis: stationary
Code
tseries::adf.test(na.remove(cars_ts))
Augmented Dickey-Fuller Test
data: na.remove(cars_ts)
Dickey-Fuller = -2.7693, Lag order = 7, p-value = 0.2523
alternative hypothesis: stationary
Code
tseries::adf.test(na.remove(gas_ts))
Augmented Dickey-Fuller Test
data: na.remove(gas_ts)
Dickey-Fuller = -2.4751, Lag order = 7, p-value = 0.3764
alternative hypothesis: stationary
Code
tseries::adf.test(na.remove(unemp_ts))
Augmented Dickey-Fuller Test
data: na.remove(unemp_ts)
Dickey-Fuller = -2.3338, Lag order = 7, p-value = 0.4361
alternative hypothesis: stationary
Uber:
Code
tseries::adf.test(na.remove(uber_ts))
Augmented Dickey-Fuller Test
data: na.remove(uber_ts)
Dickey-Fuller = -1.5743, Lag order = 11, p-value = 0.7585
alternative hypothesis: stationary
Zoom:
Code
tseries::adf.test(na.remove(zoom_ts))
Augmented Dickey-Fuller Test
data: na.remove(zoom_ts)
Dickey-Fuller = -2.5477, Lag order = 11, p-value = 0.3465
alternative hypothesis: stationary
Exxon:
Code
tseries::adf.test(na.remove(exxon_ts))
Augmented Dickey-Fuller Test
data: na.remove(exxon_ts)
Dickey-Fuller = -2.484, Lag order = 11, p-value = 0.3734
alternative hypothesis: stationary
The city-specific data (New York and Chicago ridership) ADF tests are the only ones here that indicate stationarity, as they have significantly small p-values. We will still look at detrended and differenced data in order to further evaluate which steps are necessary for fitting sufficient time series models.
Detrending and Differencing
The following plots show the following data for each time series:
Residuals of detrended data
Residuals of first-differenced data
Residuals of second-differenced data
ACF plot of original data
ACF plot of detrended data
ACF plot of first-differenced data
ACF plot of second-differenced data
By observing these all together, we can make conclusions based on which data augmentation methods are most necessary for fitting time series models in future sections.
plot1 <-ggAcf(ridership_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_ridership), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(ridership_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(ridership_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
While none of these methods fully remove the seemingly unavoidable variance created by the pandemic, the ACF plots show us that first-difference data removes a great deal of autocorrelation. This indicates that we should consider a d=1 parameter when fitting models.
plot1 <-ggAcf(mta_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_mta), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(mta_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(mta_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
Data augmentation does not seem to mitigate some of the drawbacks from autocorrelation here, as there are large ACF values in each plot. What we can notice is the very obvious seasonality, indicating that a SARIMA model may be most useful for this data.
plot1 <-ggAcf(cta_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_cta), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(cta_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(cta_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
Data augmentation does not seem to mitigate some of the drawbacks from autocorrelation here, as there are large ACF values in each plot. What we can notice is the very obvious seasonality, indicating that a SARIMA model may be most useful for this data.
plot1 <-ggAcf(cars_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_cars), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(cars_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(cars_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
Differencing clearly works here for the purpose of dealing with autocorrelation, although it is difficult to determine the necessity of first-differenced and second-differenced data. These options will be considered while fitting univariate time series models. The seasonal component that we saw above while decomposing is confirmed here.
plot1 <-ggAcf(gas_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_gas), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(gas_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(gas_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
Differencing clearly works here for the purpose of dealing with autocorrelation, although it is difficult to determine the necessity of first-differenced and second-differenced data. These options will be considered while fitting univariate time series models. The seasonal component that we saw above while decomposing is confirmed here.
plot1 <-ggAcf(unemp_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_unemp), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(unemp_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(unemp_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
In this case, first-differenced data appears to account for a great deal of autocorrelation. Thus, a parameter of d=1 should be considered while fitting univariate models.
plot1 <-ggAcf(exxon_ts, 48, main="Original Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot2 <-ggAcf(resid(fit_exxon), 48, main="Detrended Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot3 <-ggAcf(diff(exxon_ts), 48, main="First Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") plot4 <-ggAcf(diff(diff(exxon_ts)), 48, main="Second Differenced Data") +theme_bw()+geom_segment(lineend ="butt", color ="#68A2B9") +geom_hline(yintercept =0, color ="#68A2B9") grid.arrange(plot1, plot2, plot3, plot4, nrow=4)
In all of these cases, one level of differencing appears to take care of the autocorrelation issue, as shown by the ACF plots. This finding will be considered when fitting models.
Moving Average Smoothing
The process of moving average smoothing allows us to visualize data that may otherwise seem noisy and seasonal, and instead focus on the overall trends present. The following plots show this for each time series with three different levels of moving average smoothing, selected based on the expected seasonal periods.
ridership$ma4 <-ma(ridership$`Total Ridership (000s)`, 4)ridership$ma8 <-ma(ridership$`Total Ridership (000s)`, 8)ridership$ma16 <-ma(ridership$`Total Ridership (000s)`, 16)fig <- ridership %>%plot_ly(x =~`Quarter-Year`,y =~`Total Ridership (000s)`,type ='scatter',mode ='lines',name ="Quarterly Ridership",showlegend = T )%>%add_lines(y =~ma4, name ="1-Year Moving Average", line =list(color ='red')) %>%add_lines(y =~ma8, name ="2-Year Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma16, name ="4-Year Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Quarterly Total Public Transit Ridership with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Public Transit Trips (000s)"))fig
Code
mta$ma7 <-ma(mta$total_ridership, 7)mta$ma30 <-ma(mta$total_ridership, 30)mta$ma365 <-ma(mta$total_ridership, 365)fig <- mta %>%plot_ly(x =~Date,y =~total_ridership,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~ma7, name ="7-Day Moving Average", line =list(color ='red')) %>%add_lines(y =~ma30, name ="30-Day Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma365, name ="365-Day Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Daily New York Ridership with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Public Transit Trips"))fig
Code
cta$ma7 <-ma(cta$total_rides, 7)cta$ma30 <-ma(cta$total_rides, 30)cta$ma365 <-ma(cta$total_rides, 365)fig <- cta %>%plot_ly(x =~Date,y =~total_rides,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~ma7, name ="7-Day Moving Average", line =list(color ='red')) %>%add_lines(y =~ma30, name ="30-Day Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma365, name ="365-Day Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Daily Chicago Ridership with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Public Transit Trips"))fig
Code
cars$ma3 <-ma(cars$M12MTVUSM227NFWA, 3)cars$ma12 <-ma(cars$M12MTVUSM227NFWA, 12)cars$ma24 <-ma(cars$M12MTVUSM227NFWA, 24)fig <- cars %>%plot_ly(x =~observation_date,y =~M12MTVUSM227NFWA,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~ma3, name ="3-Month Moving Average", line =list(color ='red')) %>%add_lines(y =~ma12, name ="12-Month Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma24, name ="24-Month Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Monthly Vehicle Miles with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Million Vehicle Miles"))fig
Code
gas$ma3 <-ma(gas$`Price per Gallon`, 3)gas$ma12 <-ma(gas$`Price per Gallon`, 12)gas$ma24 <-ma(gas$`Price per Gallon`, 24)fig <- gas %>%plot_ly(x =~Month,y =~`Price per Gallon`,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~ma3, name ="3-Month Moving Average", line =list(color ='red')) %>%add_lines(y =~ma12, name ="12-Month Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma24, name ="24-Month Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Monthly Gas Prices with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Average Gas Price"))fig
Code
unemp$ma3 <-ma(unemp$UNRATE, 3)unemp$ma12 <-ma(unemp$UNRATE, 12)unemp$ma24 <-ma(unemp$UNRATE, 24)fig <- unemp %>%plot_ly(x =~observation_date,y =~UNRATE,type ='scatter',mode ='lines',name ="Daily Ridership",showlegend = T )%>%add_lines(y =~ma3, name ="3-Month Moving Average", line =list(color ='red')) %>%add_lines(y =~ma12, name ="12-Month Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma24, name ="24-Month Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Monthly Unemployment Rate with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Unemployment Rate"))fig
Uber:
Code
uber$ma7 <-ma(uber$UBER.Adjusted, 7)uber$ma30 <-ma(uber$UBER.Adjusted, 30)uber$ma365 <-ma(uber$UBER.Adjusted, 365)fig <- uber %>%plot_ly(x =~date,y =~UBER.Adjusted,type ='scatter',mode ='lines',name ="Uber Stock Price",showlegend = T )%>%add_lines(y =~ma7, name ="7-Day Moving Average", line =list(color ='red')) %>%add_lines(y =~ma30, name ="30-Day Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma365, name ="365-Day Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Daily Uber Stock with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Stock Price"))fig
Zoom:
Code
zoom$ma7 <-ma(zoom$ZM.Adjusted, 7)zoom$ma30 <-ma(zoom$ZM.Adjusted, 30)zoom$ma365 <-ma(zoom$ZM.Adjusted, 365)fig <- zoom %>%plot_ly(x =~date,y =~ZM.Adjusted,type ='scatter',mode ='lines',name ="Zoom Stock Price",showlegend = T )%>%add_lines(y =~ma7, name ="7-Day Moving Average", line =list(color ='red')) %>%add_lines(y =~ma30, name ="30-Day Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma365, name ="365-Day Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Daily Zoom Stock with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Stock Price"))fig
Exxon:
Code
exxon$ma7 <-ma(exxon$XOM.Adjusted, 7)exxon$ma30 <-ma(exxon$XOM.Adjusted, 30)exxon$ma365 <-ma(exxon$XOM.Adjusted, 365)fig <- exxon %>%plot_ly(x =~date,y =~XOM.Adjusted,type ='scatter',mode ='lines',name ="Exxon Stock Price",showlegend = T )%>%add_lines(y =~ma7, name ="7-Day Moving Average", line =list(color ='red')) %>%add_lines(y =~ma30, name ="30-Day Moving Average", line =list(color ='orange')) %>%add_lines(y =~ma365, name ="365-Day Moving Average", line =list(color ='#00aa00')) %>%layout(title =list(text="Daily Exxon Stock with Moving Averages"),xaxis =list(title="Date"),yaxis =list(title="Stock Price"))fig